...
 
Commits (2)
......@@ -32,7 +32,7 @@ To know how to use it please use the -h command line option.
### Test the complete pipeline on simulated data
For a complete test, we made a bunch of snakemake files.
If you are looking for a complete pipeline from synthetic data generation, you should look into the "Snakefile_d2_eval" file.
If you are looking for a complete pipeline from synthetic data generation, you should look into the "Snakefile_lcpg_eval" file.
You can play with the N (number of molecules in the interval graph), M (average number of merge to perform in a barcode), DEV (standard deviation on merge) variables to see impact on performances.
These values are arrays. You can enter multiple values and all the combinations will be done.
A summary is output in the tsv file "{WORKDIR}/eval_compare_maxclique.tsv.
......@@ -40,7 +40,7 @@ Warning: the pipeline can be very slow for huge number of parameters.
Command to run the pipeline:
```
snakemake -s Snakefile_d2_eval
snakemake -s Snakefile_lcpg_eval
```
### Data simulation
......@@ -60,15 +60,17 @@ Config parameters:
### Data structures and algorithms
* Create a d2 graph from barcode graph: use the snakemake "Snakefile_d2"
* Create a lcp graph from barcode graph: use the snakemake "Snakefile_lcpg"
The result will be generate as a compressed file in the workdir.
Config parameters:
* input: the input barcode graph (gexf format preferred).
* workdir: The working and output directory.
* to_d2_graph.py: Mount a barcode graph into memory and create a d2 graph from it.
* barcode_to_lcp_graph.py: Mount a barcode graph into memory and create a lcp graph from it.
* d2_to_path.py: take a d2 graph as input and explore the nodes to extract a udg path.
* lcpg_reduction.py: Perform a triplet transitive reduction on a lcp graph.
* evaluate.py: take a d2 graph gexf file and and analyse it. Look for an approximation of the longest correct path to reconstruct a molecule graph. Take as input a d2 graph where the truth is known in the node names (the format used to create fake data).
* lcpg_to_path.py: take a lcp graph as input and explore the nodes to extract a lcp path.
* evaluate.py: take a lcp graph gexf file and and analyse it. Look for an approximation of the longest correct path to reconstruct a molecule graph. Take as input a lcp graph where the truth is known in the node names (the format used to create fake data). Can also evaluate a lcp path.
......@@ -3,7 +3,7 @@ number_try = 5
threshold = 0.9
rule d2_path_generation:
rule lcp_path_generation:
input:
d2="{path}_d2_{type}_{method}.gexf"
output:
......@@ -11,7 +11,7 @@ rule d2_path_generation:
run:
best = 0
for _ in range(number_try):
shell("python3 deconvolution/main/d2_to_path.py {input.d2} > {output}_tmp.out")
shell("python3 deconvolution/main/lcpg_to_path.py {input.d2} > {output}_tmp.out")
score = 0
with open(f"{output}_tmp.out") as out:
score_line = out.readlines()[-2].strip()
......
......@@ -33,7 +33,7 @@ rule compress_data:
"rm -rf {wildcards.barcode_file}/ ;"
"cd - ;"
rule d2_simplification:
rule lcpg_simplification:
input:
d2_raw="{barcode_path}_d2_raw_{method}.gexf"
output:
......@@ -41,10 +41,10 @@ rule d2_simplification:
wildcard_constraints:
method="[A-Za-z0-9]+"
shell:
"python3 deconvolution/main/d2_reduction.py -o {output.simplified_d2} {input.d2_raw}"
"python3 deconvolution/main/lcpg_reduction.py -o {output.simplified_d2} {input.d2_raw}"
rule d2_generation:
rule lcpg_generation:
input:
barcode_graph=f"{WORKDIR}/{{file}}.gexf"
output:
......@@ -53,7 +53,7 @@ rule d2_generation:
wildcard_constraints:
method="[A-Za-z0-9]+"
run:
shell(f"python3 deconvolution/main/to_d2_graph.py {{input.barcode_graph}} --{{wildcards.method}} -t {{threads}} -o {WORKDIR}/{{wildcards.file}}_d2_raw_{{wildcards.method}}.gexf")
shell(f"python3 deconvolution/main/barcode_to_lcp_graph.py {{input.barcode_graph}} --{{wildcards.method}} -t {{threads}} -o {WORKDIR}/{{wildcards.file}}_d2_raw_{{wildcards.method}}.gexf")
rule setup_workdir:
......
include: "Snakefile_data_simu"
include: "Snakefile_d2"
include: "Snakefile_d2_path"
include: "Snakefile_lcpg"
include: "Snakefile_lcp_path"
WORKDIR = "snake_experiments" if "workdir" not in config else config["workdir"]
N = [5000]
D = [10]
N = [500]
D = [6]
M = [2]
DEV = [0]
......@@ -32,7 +32,7 @@ rule comparable_tsv:
splits = "/".join(path_eval_lines[-2].strip().split(': ')[-1].split(' - '))
print(f"{n}\t{m}\t{dev}\t{longest_path_d2}\t{greedy_path}\t{splits}", file=out)
rule eval_d2:
rule eval_lcpg:
input:
"{file}_d2_{type}_maxclq.gexf"
output:
......