...
 
Commits (2)
...@@ -32,7 +32,7 @@ To know how to use it please use the -h command line option. ...@@ -32,7 +32,7 @@ To know how to use it please use the -h command line option.
### Test the complete pipeline on simulated data ### Test the complete pipeline on simulated data
For a complete test, we made a bunch of snakemake files. For a complete test, we made a bunch of snakemake files.
If you are looking for a complete pipeline from synthetic data generation, you should look into the "Snakefile_d2_eval" file. If you are looking for a complete pipeline from synthetic data generation, you should look into the "Snakefile_lcpg_eval" file.
You can play with the N (number of molecules in the interval graph), M (average number of merge to perform in a barcode), DEV (standard deviation on merge) variables to see impact on performances. You can play with the N (number of molecules in the interval graph), M (average number of merge to perform in a barcode), DEV (standard deviation on merge) variables to see impact on performances.
These values are arrays. You can enter multiple values and all the combinations will be done. These values are arrays. You can enter multiple values and all the combinations will be done.
A summary is output in the tsv file "{WORKDIR}/eval_compare_maxclique.tsv. A summary is output in the tsv file "{WORKDIR}/eval_compare_maxclique.tsv.
...@@ -40,7 +40,7 @@ Warning: the pipeline can be very slow for huge number of parameters. ...@@ -40,7 +40,7 @@ Warning: the pipeline can be very slow for huge number of parameters.
Command to run the pipeline: Command to run the pipeline:
``` ```
snakemake -s Snakefile_d2_eval snakemake -s Snakefile_lcpg_eval
``` ```
### Data simulation ### Data simulation
...@@ -60,15 +60,17 @@ Config parameters: ...@@ -60,15 +60,17 @@ Config parameters:
### Data structures and algorithms ### Data structures and algorithms
* Create a d2 graph from barcode graph: use the snakemake "Snakefile_d2" * Create a lcp graph from barcode graph: use the snakemake "Snakefile_lcpg"
The result will be generate as a compressed file in the workdir. The result will be generate as a compressed file in the workdir.
Config parameters: Config parameters:
* input: the input barcode graph (gexf format preferred). * input: the input barcode graph (gexf format preferred).
* workdir: The working and output directory. * workdir: The working and output directory.
* to_d2_graph.py: Mount a barcode graph into memory and create a d2 graph from it. * barcode_to_lcp_graph.py: Mount a barcode graph into memory and create a lcp graph from it.
* d2_to_path.py: take a d2 graph as input and explore the nodes to extract a udg path. * lcpg_reduction.py: Perform a triplet transitive reduction on a lcp graph.
* evaluate.py: take a d2 graph gexf file and and analyse it. Look for an approximation of the longest correct path to reconstruct a molecule graph. Take as input a d2 graph where the truth is known in the node names (the format used to create fake data). * lcpg_to_path.py: take a lcp graph as input and explore the nodes to extract a lcp path.
* evaluate.py: take a lcp graph gexf file and and analyse it. Look for an approximation of the longest correct path to reconstruct a molecule graph. Take as input a lcp graph where the truth is known in the node names (the format used to create fake data). Can also evaluate a lcp path.
...@@ -3,7 +3,7 @@ number_try = 5 ...@@ -3,7 +3,7 @@ number_try = 5
threshold = 0.9 threshold = 0.9
rule d2_path_generation: rule lcp_path_generation:
input: input:
d2="{path}_d2_{type}_{method}.gexf" d2="{path}_d2_{type}_{method}.gexf"
output: output:
...@@ -11,7 +11,7 @@ rule d2_path_generation: ...@@ -11,7 +11,7 @@ rule d2_path_generation:
run: run:
best = 0 best = 0
for _ in range(number_try): for _ in range(number_try):
shell("python3 deconvolution/main/d2_to_path.py {input.d2} > {output}_tmp.out") shell("python3 deconvolution/main/lcpg_to_path.py {input.d2} > {output}_tmp.out")
score = 0 score = 0
with open(f"{output}_tmp.out") as out: with open(f"{output}_tmp.out") as out:
score_line = out.readlines()[-2].strip() score_line = out.readlines()[-2].strip()
......
...@@ -33,7 +33,7 @@ rule compress_data: ...@@ -33,7 +33,7 @@ rule compress_data:
"rm -rf {wildcards.barcode_file}/ ;" "rm -rf {wildcards.barcode_file}/ ;"
"cd - ;" "cd - ;"
rule d2_simplification: rule lcpg_simplification:
input: input:
d2_raw="{barcode_path}_d2_raw_{method}.gexf" d2_raw="{barcode_path}_d2_raw_{method}.gexf"
output: output:
...@@ -41,10 +41,10 @@ rule d2_simplification: ...@@ -41,10 +41,10 @@ rule d2_simplification:
wildcard_constraints: wildcard_constraints:
method="[A-Za-z0-9]+" method="[A-Za-z0-9]+"
shell: shell:
"python3 deconvolution/main/d2_reduction.py -o {output.simplified_d2} {input.d2_raw}" "python3 deconvolution/main/lcpg_reduction.py -o {output.simplified_d2} {input.d2_raw}"
rule d2_generation: rule lcpg_generation:
input: input:
barcode_graph=f"{WORKDIR}/{{file}}.gexf" barcode_graph=f"{WORKDIR}/{{file}}.gexf"
output: output:
...@@ -53,7 +53,7 @@ rule d2_generation: ...@@ -53,7 +53,7 @@ rule d2_generation:
wildcard_constraints: wildcard_constraints:
method="[A-Za-z0-9]+" method="[A-Za-z0-9]+"
run: run:
shell(f"python3 deconvolution/main/to_d2_graph.py {{input.barcode_graph}} --{{wildcards.method}} -t {{threads}} -o {WORKDIR}/{{wildcards.file}}_d2_raw_{{wildcards.method}}.gexf") shell(f"python3 deconvolution/main/barcode_to_lcp_graph.py {{input.barcode_graph}} --{{wildcards.method}} -t {{threads}} -o {WORKDIR}/{{wildcards.file}}_d2_raw_{{wildcards.method}}.gexf")
rule setup_workdir: rule setup_workdir:
......
include: "Snakefile_data_simu" include: "Snakefile_data_simu"
include: "Snakefile_d2" include: "Snakefile_lcpg"
include: "Snakefile_d2_path" include: "Snakefile_lcp_path"
WORKDIR = "snake_experiments" if "workdir" not in config else config["workdir"] WORKDIR = "snake_experiments" if "workdir" not in config else config["workdir"]
N = [5000] N = [500]
D = [10] D = [6]
M = [2] M = [2]
DEV = [0] DEV = [0]
...@@ -32,7 +32,7 @@ rule comparable_tsv: ...@@ -32,7 +32,7 @@ rule comparable_tsv:
splits = "/".join(path_eval_lines[-2].strip().split(': ')[-1].split(' - ')) splits = "/".join(path_eval_lines[-2].strip().split(': ')[-1].split(' - '))
print(f"{n}\t{m}\t{dev}\t{longest_path_d2}\t{greedy_path}\t{splits}", file=out) print(f"{n}\t{m}\t{dev}\t{longest_path_d2}\t{greedy_path}\t{splits}", file=out)
rule eval_d2: rule eval_lcpg:
input: input:
"{file}_d2_{type}_maxclq.gexf" "{file}_d2_{type}_maxclq.gexf"
output: output:
......