Skip to content
Snippets Groups Projects
Commit 34adf732 authored by Yoann Dufresne's avatar Yoann Dufresne
Browse files

README update

parent 03f6d8de
No related branches found
No related tags found
No related merge requests found
...@@ -2,6 +2,8 @@ ...@@ -2,6 +2,8 @@
A compilation of scripts and pipelines to count and extract scaffolds of barcodes from linked reads datasets. A compilation of scripts and pipelines to count and extract scaffolds of barcodes from linked reads datasets.
**WARNING**: This code is a proof of concept, not a usable software for production. If the code is too slow for your tests or you are encontering some bugs (maybe it's a feature ? :p) don't hesitate to contact us via the issues or with a direct mail to me (yoann [dot] dufresne [at] pasteur [dot] fr).
## Nomenclature warnings ## Nomenclature warnings
During the process of writing a scientific article, some of the datastructure names have been modified. During the process of writing a scientific article, some of the datastructure names have been modified.
In this repository the majority of the names are old names. In this repository the majority of the names are old names.
...@@ -27,6 +29,20 @@ Install the package from the root directory. ...@@ -27,6 +29,20 @@ Install the package from the root directory.
For the majority of the scripts, argparse is used. For the majority of the scripts, argparse is used.
To know how to use it please use the -h command line option. To know how to use it please use the -h command line option.
### Test the complete pipeline on simulated data
For a complete test, we made a bunch of snakemake files.
If you are looking for a complete pipeline from synthetic data generation, you should look into the "Snakefile_d2_eval" file.
You can play with the N (number of molecules in the interval graph), M (average number of merge to perform in a barcode), DEV (standard deviation on merge) variables to see impact on performances.
These values are arrays. You can enter multiple values and all the combinations will be done.
A summary is output in the tsv file "{WORKDIR}/eval_compare_maxclique.tsv.
Warning: the pipeline can be very slow for huge number of parameters.
Command to run the pipeline:
```
snakemake -s Snakefile_d2_eval
```
### Data simulation ### Data simulation
* generate_fake_molecule_graph.py: Create a linear molecule graph, where the molecules are linked to the d molecules on their left and d molecules on their right. * generate_fake_molecule_graph.py: Create a linear molecule graph, where the molecules are linked to the d molecules on their left and d molecules on their right.
......
...@@ -3,8 +3,8 @@ include: "Snakefile_d2" ...@@ -3,8 +3,8 @@ include: "Snakefile_d2"
include: "Snakefile_d2_path" include: "Snakefile_d2_path"
WORKDIR = "snake_experiments" if "workdir" not in config else config["workdir"] WORKDIR = "snake_experiments" if "workdir" not in config else config["workdir"]
N = [500] N = [5000]
D = [6] D = [10]
M = [2] M = [2]
DEV = [0] DEV = [0]
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment